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Abstract 


When  using  item  response  theory  (IRT)  models  in  educational 
and  psychological  measurement,  it  is  standard  practice  to  estimate 
the  operating  characteristics  of  test  items  from  examinees'  item 
responses  alone.  This  is  the  final  report  of  a  project  that 
employed  Bayesian  and  empirical  Bayesian  methods  to  exploit 
additional  information  that  is  often  available  about  test  items 
(e.g.,  format,  content,  or  cognitive  processing  requirements)  or 
about  examinees  (e.g.,  educational  background  or  demographic 
status).  Practical  and  theoretical  results  obtained  in  a  series 
of  research  reports  are  summarized. 

Key  words:  Bayesian  Estimation,  Collateral  Information, 

Differential  Strategies,  Empirical  Bayes 
Estimation,  Information  Matrices,  Item  Response 
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Introduction 


Item  response  theory  (IRT)  models  in  psychometrics  give  the 
probability  that  an  examinee  will  respond  correctly  to  a  given 
test  item  in  terms  of  parameters  for  just  that  examinee  and  that 
item.  This  formulation  makes  it  possible  to  solve  many  practical 
measurement  problems  that  are  difficult  or  intractable  under 
classical  test  theory,  including  adaptive  ability  testing,  large 
population  equating  studies,  and  test  construction  to  targeted 
operating  specifications. 

It  is  standard  practice  to  estimate  IRT  item  parameters 
solely  from  the  observed  responses  of  a  sample  of  examinees.  This 
project  was  motivated  by  a  desire  to  improve  estimation  by 
exploiting  collateral  information  that  is  often  available  about 
test  items  (e.g.,  format,  content,  or  cognitive  processing 
requirements)  or  about  examinees  (e.g.,  educational  background  or 
demographic  status).  Table  1  lists  the  reports  from  the  project 
exploring  both  practical  and  theoretical  aspects  of  the  problem. 
The  present  report  summarizes  the  main  results.  The  interested 
reader  is  referred  to  the  individual  papers  for  details, 
derivations,  and  examples. 


Table  1  about  here 


Incorporating  Collateral  Information  into  IRT 


The  initial  thrusts  of  the  project  were  to  determine  how  to 
incorporate  collateral  information  into  estimation  procedures  when 
the  IRT  model  is  correct,  and  to  gauge  its  impact  on  estimation 
precision.  Bayesian  and  empirical  Bayesian  methods  were  employed 
to  this  end.  This  section  describes  the  basic  model  (Misievy, 

1987 ;  in  press) . 

Under  an  IRT  model,  the  probability  of  response  to  Item  j 
with  a  possibly  vector-valued  item  parameter  0.  from  an  examinee 
with  proficiency  parameter  6  is  given  as 

P(Xj \B.0  )  -  f(x  1*.^)  ,  (1) 

where  the  form  of  the  item  response  function  f  is  known  up  to  the 
item  parameters.  Under  the  usual  assumption  of  local 
independence,  the  conditional  probability  of  the  response  pattern 
x  -  (x^,...,x  )  to  n  test  items  is  simply  the  product  of 
expressions  like  (1): 


P(x|«,0)  -  n  P(x  \8,0  )  ,  (2) 

j  J  J 

where  B  -  (fl,  9  )  .  Let  the  data  matrix  X  -  (x.  ....  ,  x.,) 

-  1  n  -  -1  -N 

represent  response  vectors  observed  from  a  sample  of  N  examinees 
from  a  population  in  which  9  follows  the  density  p(0).  The 
likelihood  for  B  induced  by  X  is  obtained  as 


2 


(3) 


Lx(£|X)  -  n  /  f(x.|0,£)  p(9)  d 9 
i 

"Marginal  maximum  likelihood"  (MML)  estimates  of  item  parameters 
(e.g. ,  Bock  and  Aitkin,  1981)  are  obtained  by  maximizing  (3)  with 
respect  to  f) . 

Suppose  that  in  addition  to  item  responses,  values  of 
collateral  variables  y  are  also  available  from  examinees.  The 
appropriate  marginal  likelihood  is  now 


Lxy<2l?-P  -  n  /  PC^  lyt)  ■  (4) 


MML  estimates  of  item  parameters  that  exploit  collateral 
information  about  examinees  are  obtained  by  maximizing  (4)  with 
respect  to  0  (Mislevy,  1987). 

Bayesian  item  parameter  estimates  are  obtained  from  posterior 
distributions  for  /),  which  arise  as  the  normalized  product  of  a 
likelihood  function  such  as  (3)  or  (4)  and  a  prior  distribution 
for  0,  say  g(/9)  .  If,  before  observing  data,  one  possesses  no 
information  to  differentiate  expectations  about  the  parameters  of 
different  items,  an  exchangeable  prior  for  0  is  appropriate;  that 
is,  the  items  are  modeled  as  if  they  were  n  random  draws  from  the 
same  distribution.  In  this  case  the  posterior  distribution  is 
given  by 
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(5) 


px(£|X)  a  Lx(g|X)  n  g(fi  > 
j  J 

or 


p  (g|X,Y)  a  L  (0|X,Y>  n  g(0  ) 
-  ~  ,•  j 


(6) 


depending  on  whether  collateral  information  is  available  about 
examinees.  If  values  on  the  collateral  variable  z  are 
additionally  available  about  items,  they  are  incorporated  as 


P  <0ixtz)  «  l  (0|X)  n  g(0  |z  ) 

xz  x  ~  j  j  j 


(7) 


or 


p  (^|X,Y,Z)  ot  LOIX.Y)  n  g(/3  |z  )  (8) 

xyz  --  xy - j  J  J 

(Mislevy,  in  press).  Standard  Bayesian  procedures  for  estimating 
item  and  population  parameters  that  do  not  employ  collateral 
information  extend  to  (7)  and  (8)  in  a  straightforward  manner 
(Mislevy,  1987,  in  press). 

Increase  in  Information:  Theoretical  Results 
Using  general  results  about  missing  data  problems,  such  as 
Orchard  and  Woodbury's  (1972)  "missing  information  principle,"  it 
is  possible  to  derive  upper  and  lower  bounds  for  the  expected 
precision  of  item  parameter  estimates  with  and  without  collateral 
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information  (Mislevy  and  Sheehan,  1988,  in  press).  The  results 
are  expressed  most  easily  in  Bayesian  terms. 

Consider  first  the  impact  of  collateral  information  about 
examinees .  Let  V(/9|0,X,Y)  represent  the  posterior  variance  of  /3 
that  would  be  obtained  after  observing  values  of  not  only  item 
responses  x  and  collateral  variables  y  from  a  sample  of  N 
examinees,  but  values  of  their  latent  proficiencies  9  as  well. 

Let  analogous  expressions  represent  posterior  variance  of  f3  when 
values  of  one  or  more  types  of  variables  are  not  observed;  for 
example,  V(/3|X)  when  only  item  responses  are  observed.  The 
following  relationships  may  be  derived: 

E[V<£|0,X,Y)]  -  E[V(g|£,X>] 

<  E[V(/3|X,Y)  ] 

S  E[V(g|X)]  , 

where  A<B  means  that  the  matrix  difference  B-A  is  at  least 
positive  semidef inite .  Thus  the  precision  of  item  parameter 
estimation  when  using  collateral  information  about  examinees  along 
with  item  responses  is  at  least  as  great  as  that  expected  when 
using  item  responses  alone,  but  cannot  exceed  the  precision  that 
would  be  expected  with  the  same  sample  size  if  values  of  the 
latent  variable  9  could  be  observed  as  well. 

An  obvious  lower  bound  holds  the  impact  of  collateral 
information  about  items: 
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E[V(/3|X,Z)  ]  <  E[V(g|X)]  ; 


that  is,  expected  precision  when  using  collateral  information 
about  items  in  addition  to  item  responses,  equals  or  exceeds 
precision  expected  when  not  using  it.  No  ordering  holds  between 
E[V(/?|X,Z)j  and  E[V(j8|0,X)]  in  general.  In  particular,  when  Z  is 
employed  along  with  X,  it  is  possible  to  exceed  the  precision 
obtainable  with  9  and  X. 

Increase  in  Information:  Practical  Results 

By  examining  the  structure  of  information  matrices  with  and 
without  collateral  information,  and  by  applying  the  methods  to 
data  from  the  National  Assessment  of  Educational  Progress  (NAEP) 
and  the  Profile  of  American  Youth  surveys,  it  was  found  that 
modest  increases  in  the  precision  of  item  parameter  estimates  can 
be  achieved  by  using  collateral  information  (Mislevy,  1987,  in 
press;  Mislevy  and  Sheehan,  1988,  in  press). 

From  collateral  information  about  examinees .  increases  in 
information  depend  on  the  strength  of  the  relationship  of  the 
collateral  variables  with  9.  In  typical  educational  and 
psychological  settings  where  collateral  information  can  often 
account  for  about  a  third  of  the  population  variance,  and  with 
item  reliabilities  typical  of  those  settings,  gains  equivalent  to 
2  to  6  additional  test  items  can  be  expected.  This  gain  is 
substantial  when  few  responses  are  available  from  each  examinee, 
as  in  educational  assessments,  and  may  be  useful  in  adaptive 
testing  where  tests  are  short  but  well- targeted.  It  is 


unimpressive  in  individual  achievement  testing,  where  tests  of 
sixty  items  or  more  are  common. 

From  collateral  information  about  items .  increases  equivalent 
to  hundred  and  fifty  additional  examinees  were  found  for  Rasch 
item  difficulty  parameters  in  a  junior  high  fractions  test 
(Mislevy,  in  press).  While  a  gain  of  this  magnitude  would  be 
unimpressive  in  applications  where  data  from  thousands  of 
examinees  is  already  at  hand,  it  is  meaningful  in  situations  when 
either  (1)  few  examinees  have  been  tested,  as  in  the  fractions 
example  or  in  local  testing  problems,  or  (2)  no  examinees  have 
been  tested,  as  when  approximating  item  statistics  for  newly- 
written  test  items. 

In  addition  to  small-sample  applications,  collateral 
information  about  items  can  play  an  important  role  in  both  item 
construction  and  diagnosis  regardless  of  sample  size.  The 
conditional  distributions  of  item  parameters,  p(/3|z),  express  item 
operating  characteristics  such  as  difficulty  in  terms  of  salient 
features  of  the  items.  To  the  degree  that  these  distributions 
succeed  in  explaining  item  operating  characteristics,  the  test 
constructor  can  manipulate  the  features  to  modify  items  in 
intended  ways  or  to  create  new  items  that  tap  the  same  essential 
skills.  To  the  degree  that  items  depart  from  the  centers  of  these 
predictive  distributions,  they  are  hard  or  easy  for  reasons  other 
than  those  held  most  important  in  describing  the  domain.  Outliers 
are  suspect  as  flawed  or  irrelevant.  The  approach  implied  bv  (5) 


and  (6)  is  a  step  in  the  direction  of  integrating  educational  and 


T 
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psychological  theory  into  the  measurement  process.  (Its 
application  to  the  items  in  the  Document  Utilization  scale  of  the 
NAEP  Survey  of  Adult  Literacy  is  currently  in  progress.) 

When  Collateral  Information  Must  Be  Used 

The  preceding  sections  discuss  how,  when  all  examinees  are 
presented  all  items,  collateral  information  about  examinees  and 
items  may  be  exploited  to  obtain  more  precise  item  parameter 
estimates.  Consistent  estimates  are  still  obtained  in  this  case 
if  the  collateral  information  is  not  used  (Mislevy  and  Sheehan, 
in  press) .  The  same  results  apply  when  each  examinee  receives 
only  a  random  subset  of  items. 

This  is  not  the  case  that  obtains  in  many  practical 
applications  of  IRT,  however.  In  order  to  obtain  more  information 
about  item  or  examinee  parameters  per  observed  response,  items  are 
often  administered  to  examinees  as  a  function  of  item  and  examinee 
collateral  variables.  Fourth  grade  students  may  be  presented  an 
easier  test  form  than  the  overlapping  form  fifth  graders  receive, 
for  example;  and  a  high  school  graduate  may  be  presented  a  harder 
item  first  in  an  adaptive  test  than  a  nongraduate.  In  order  to 
obtain  consistent  MML  item  parameter  estimates,  it  is  mandatory  to 
employ  collateral  information  about  examinees -- i . e . ,  to  use  (4) 
rather  than  (3)  (Mislevy  and  Sheehan,  in  press).  In  order  to 
obtain  the  correct  Bayesian  inferences,  it  is  mandatory  to  use 
collateral  information  about  items  as  well--i.e.,  to  base 
inferences  on  (8)  rather  than  (4)  (Mislevy  and  Wu,  1988).  Mislevy 
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k. 


and  Sheehan  (in  press)  give  a  simple  counterexample  with  the  Rasch 
model  to  demonstrate  an  asymptotic  bias  in  item  parameter 
estimation  in  such  a  case  if  collateral  information  is  ignored. 

Modeling  Item  Responses  when  Different  Examinees 
Follow  Different  Solution  Strategies 

Initial  work  on  using  collateral  information  about  items 

assumed  that  the  IRT  model  was  strictly  correct.  Thinking  about 

the  features  of  items  that  made  them  easy  or  hard,  however,  made 

it  clear  that  difficulty  depends  on  the  way  that  the  examinees  are 

attempting  to  arrive  at  their  answers.  In  particular,  different 

features  of  items  can  make  them  differentially  difficult  for 

examinees  who  follow  different  solution  strategies.  This  insight 

led  to  the  formulation  of  a  mixture  of  IRT  models  (Mislevy  and 

Verhelst,  in  press).  Resolving  the  mixture  demands  a  type  of 

collateral  information  that  plays  no  role  whatsoever  in 

traditional  psychometrics,  including  standard  IRT:  psychological 

theor”  about  the  different  strategies  that  examinees  might  follow. 

The  key  idea  is  to  model  item  difficulty  in  terms  of  salient 

item  features- -  features  that  tend  to  make  an  item  easy  or 

difficult  under  various  strategies.  The  Mislevy-Verhelst  model 

makes  the  following  assumptions: 

1.  A  finite  number  of  known  solution  strategies  apply. 

2.  Each  examinee  is  applying  the  only  one  of  these  strategies 
for  all  the  items  in  the  set. 


9 


3.  The  responses  of  an  examinee  are  observed  but  the  strategy 
he  or  she  has  employed  is  not. 

4.  The  responses  of  examinees  following  Strategy  k  conform  to 
,  an  item  response  model  of  a  known  form. 

5.  Substantive  theory  posits  relationships  between  observable 
features  of  items  and  the  probabilities  of  success  enjoyed  by 
members  of  each  strategy  class.  The  relationships  may  be 
known  either  fully  or  only  partially- -e . g. ,  known  as  to 
parametric  form  but  not  parameter  values. 

Let  8  -  (8.  , . . . , 8V)  be  an  examinee  proficiency  parameter, 
with  the  element  8 ^  corresponding  to  proficiency  if  Strategy  k  is 
employed.  Let  <t>  -  )  be  an  examinee  strategy  parameter, 

with  all  elements  zero  except  for  the  single  element  k 
corresponding  to  the  strategy  that  is  employed;  this  element  takes 
the  value  1.  Let  the  operating  characteristics  of  Item  j  under 
Strategy  k  be  given  as  follows: 


p[x.|«k^k(zjk|a)’V1]  “  fk[xj|Sk’\(zjk|a 


)] 


(9) 


where  /3^(Zj^|a),  the  item  parameter  for  Item  j  that  applies  when 
examinees  follow  Strategy  k,  depends  on  its  salient  features  z  ^ 
under  that  strategy  and  a  relatively  small  number  of  basic 
strategy  parameters  a.  The  MML  function  for  estimating  a  induced 
by  the  data  matrix  X  from  a  sample  of  N  examinees  and  the 
item/strategy  collateral  variables  Z  is  obtained  as 
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N  K  n 

L(o|x,z)  -  n  j  *k  n  fk[x  |s,^k(2  |a)]  g  (0)  dd  ,  (io> 

i-l  k-1  j-1  J  J 

where  gk  is  the  density  of  0k  among  those  examinees  following 
Strategy  k,  and  irk  is  the  proportion  of  the  population  who  do  so 
If  the  gks  and  the  its  are  not  known,  they  too  can  be  estimated  via 
MML  by  maximizing  (10)  with  respect  to  them  as  well. 

If  the  as,  gks,  and  jts  are  known  or  well  estimated,  it  is 
possible  to  calculate  for  a  given  examinee  the  probability  that 
his  response  vector  was  produced  under  a  given  strategy  and  to 
estimate  his  ability  under  each  possibility.  By  Bayes  theorem, 
the  posterior  probability  of  Strategy  k  and  proficiency  8  under 
that  strategy  is  obtained  as 

P(tf,<*k-l|x)  -  C  fk(xK,£k<z.k))  gk(*)  *k  , 
where  C  is  the  normalizing  constant  obtained  as 

C'1  -  Z  /  fk(x|5,0k(z  ))  gk(9)  d 9  *k  . 

k  J 

The  posterior  probability  that  Strategy  k  was  employed  is 
P(*k-l|x)  -  J  P(« ,*k-l|x)  d 9 

and  the  posterior  mean  proficiency  conditional  on  (i-e-» 

supposing  that  Strategy  k  was  used)  is 
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E(#k|x,*k-1)  -/  9  P(*,*k-l|x)  d 9  P'^-ljx)  . 

The  significance  of  this  model  lies  In  Its  ability  to  express 
how  examinees  solve  Items  rather  than  iust  how  many  they  solve. 

The  latter  is  all  that  the  standard  models  of  test  theory  can  do. 
Areas  of  potential  benefit  include  psychological  investigations  of 
alternative  processing  models,  educational  decisions  involving 
level  of  understanding,  and  determinations  of  alternative  mental 
models  in  problem  solving.  The  approach  opens  the  door  to  such 
applications  as  (1)  adaptive  testing  schemes  designed  to  infer  how 
examinees  solve  problems  as  well  as  how  well  they  solve  them,  and 
(2)  studies  of  changes  in  the  structure  as  well  as  the  level  of 
intelligence  in  the  course  of  human  development. 

Inferring  Examinee  Ability  When  Some  Item  Responses  Are  Missing 

In  practical  applications  of  item  response  theory  (IRT), 
there  are  several  reasons  that  item  responses  may  not  be  observed 
from  all  examinees  to  all  test  items.  The  reason  most  germane  to 
the  collateral  information  problem  is  the  intentional 
administration  of  only  subsets  of  items  to  examinees,  with  the 
subset  depending  on  collateral  information.  It  was  mentioned 
above  that  collateral  information  must  be  taken  into  account  in 
these  cases.  In  addition  to  this  type  of  missingness,  Mislevy  and 
Wu  (1988)  studied  problems  of  inference  that  arise  with  several 
other  types  of  missingness  that  arise  frequently  in  IRT. 

To  preface  the  results  of  their  study,  we  review  Rubin's 
(1976)  notions  about  " ignorability"  of  missing  data.  Ignoring  the 
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missingness  process  under  direct  likelihood  Inference  means  using 
a  pseudo- likelihood  that  includes  terms  for  only  the  responses 
that  were  observed,  without  regard  for  the  processes  by  which  they 
came  to  be  observed.  The  resulting  inferences  are  appropriate  if 
the  pseudo- likelihood  is  proportional  to  the  correct  likelihood 
that  does  account  for  the  missingness  process.  In  this  case  the 
correct  point  estimate  of  the  maximum  likelihood  estimate  (MLE)  is 
obtained.  Sampling-distribution  inferences  based  on  the  MLE  are 
appropriate  only  if  the  missingness  pattern  does  not  depend  on  the 
values  of  the  observed  data.  When  this  condition  holds,  sampling- 
distribution  inferences  can  be  drawn  with  regard  to  repeated 
samples  of  responses  to  only  those  items  whose  responses  were 
observed.  The  missingness  process  is  ignorable  with  respect  to 
Bayesian  inference  if  the  correct  Bayesian  posterior  is 
proportional  to  the  product  :f  the  pseudo-likelihood  and  an 
appropriate  prior  distribution. 

For  fives  common  types  of  missingness  in  IRT,  Mislevy  and  Wu 
first  used  Rubin's  (1976)  theorems  to  determine  whether 
ignorability  holds  under  direct  likelihood  and  Bayesian  inference 
about  examinee  parameters  9  when  item  parameters  are  known.  In 
those  cases  in  which  the  correct  value  of  the  MLE  is  obtained 
under  direct  likelihood  inference,  they  asked  whether  sampling 
distribution  inferences  based  on  the  MLE  were  appropriate .  They 
then  considered  the  analogous  questions  for  inferences  about  fi 
when  the  examinee  parameters  are  eliminated  by  marginalization,  as 
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in  (3)-(8).  The  findings  are  summarized  below.  Tables  2  and  3 
highlight  the  results  on  ignorability . 


Tables  2  and  3  about  here 


Case  1:  Alternate  Test  Forms.  When  an  examinee  is  assigned 
one  of  several  alternative  test  forms  by  a  random  process  such  as 
a  coin  flip  or  a  spiralling  scheme,  the  process  that  renders 
missing  the  responses  to  items  on  the  forms  not  presented  is 
ignorable  for  all  three  types  of  inference,  both  for  estimating 
and  for  estimating  6  when  /3  is  known. 

Case  2:  Targeted  Testing.  When  collateral  variables  such  as 
educational  or  demographic  status  are  used  to  assign  an  examinee 
one  of  several  test  forms  that  differ  in  their  measurement 
properties,  the  resulting  missingness  on  forms  not  given  is 
ignorable  under  direct  likelihood  inference  for  S  given  /?,  but  not 
under  Bayesian  inference  unless  the  prior  information  about 
examinees  that  led  to  differential  assignments  is  conditioned  on. 
This  information  must  be  taken  into  account  for  both  likelihood 
and  Bayesian  inferences  about  /?;  for  Bayesian  inference,  prior 
information  about  /9  used  to  select  items  must  additionally  be 
taken  into  account.  Sampling  distribution  inferences  may  be  based 
on  MLEs  for  0  and  for  6  given  conditional  on  the  observed 
patterns  of  form  administration  within  values  of  the  examinee 
variables  used  for  targeting. 
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It  should  be  emphasized  that  these  conclusions  depend  on  the 
veracity  of  the  IRT  model.  In  particular,  it  is  necessary  that 
the  regression  of  a  correct  response  on  ability  be  invariant  with 
respect  to  collateral  information.  This  assumption  may  well  fail 
in  a  situation  of  currently  increasing  interest:  An  item  pool  is 
calibrated  using  an  IRT  model,  and  a  school  is  allowed  to  measure 
students  using  only  those  items  it  deems  relevant  to  its 
curriculum.  If  students  from  different  schools  have  had  different 
opportunities  to  learn  the  skills  tapped  by  different  items,  then 
tailoring  tests  to  their  strengths  leads  almost  certainly  to  item 
by  school  by  ability  interactions- -a  violation  of  the  IRT  model. 
Estimates  for  schools  and  individuals  within  schools  tend  to 
overestimate  the  scores  they  would  have  received  had  they  been 
given  all  items,  or  randomly  selected  subsets  of  items.  This  use 
of  IRT  may  hold  practical  value  nonetheless,  provided  that  such 
scores  are  viewed  not  as  consistent  estimates  of  performance  in 
the  total  pool  but  as  indicators  of  a  kind  of  maximal  performance. 

Case  3:  Adaptive  Testing.  In  adaptive  testing,  item 
assignment  proceeds  item  by  item  for  each  examinee  according  to 
the  values  of  his  responses  to  preceding  items.  The  same 
conclusions  as  for  Case  2  hold  for  direct  likelihood  and  Bayesian 
inference.  Ignorability  under  direct  likelihood  inference  means 
that  the  correct  points  are  identified  as  MLEs  of  8  given  /?  and  of 
f}.  The  usual  MLE  properties  under  sampling-distribution  inference 
need  not  hold,  however,  because  the  probabilities  of  miss’ ngness 
patterns  depend  on  the  values  of  observed  responses. 
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Case  4;  Not-reached  Items.  When  some  examinees  run  out  of 


time  before  they  see  the  last  items  on  a  nearly  nonspeeded  test, 
the  not-reached  process  is  ignorable  with  respect  to  direct 
likelihood  inference  about  8  given  0,  and  the  MLE  supports 
sampling  distribution  inferences  that  pertain  to  repeated 
administrations  of  the  items  that  were  actually  reached.  This 
missingness  process  is  not  ignorable  under  Bayesian  inference 
unless  speed  and  ability  are  independent.  And  only  then  can 
direct  likelihood  inferences  about  f)  ignore  the  missingness. 
Furthermore,  Bayesian  inferences  about  /3  require  that  collateral 
variables  for  items  be  employed  if  they  played  a  role  in 
determining  which  items  would  not  be  reached,  as  when  items  are 
ordered  from  easy  to  hard. 

Case  5:  Intentional  Omission.  When  examinees  are  presented 
items,  have  a  chance  to  appraise  their  content,  and  decide  for 
their  own  reasons  not  to  respond,  the  missingness  is  not 
ignorable.  Inferences  must  be  drawn  from  a  full  model  for  the 
joint  distribution  of  missingness  and  item  response. 

Not  surprisingly,  modeling  this  nonignorable  nonresponse  is 
difficult.  Neither  of  the  two  most  ambitious  approaches  proposed 
to  date,  namely  Lord's  (1983)  model  for  omits  and  the  use  of 
multiple-category  IRT  models  (e.g.,  Bock,  1972),  handles  the  issue 
of  local  independence  in  a  fully  satisfactory  manner.  Under 
Lord's  (1983)  model,  the  marginal  model  for  item  responses  is  not 
a  standard  IRT  model  depending  on  8  alone  and  exhibiting  local 
independence.  Under  the  multiple -category  model  approach,  local 
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independence  fails  unless  all  examinees  at  any  given  ability  level 
have  the  same  propensity  to  omit  items  they  are  unsure  of,  rather 
than  guess  at  random. 

If  one  assumes  that  examinees  are  perfect  judges  of  their 
chances  of  responding  correctly,  and  omit  only  if  it  is  in 
accordance  with  the  strategy  that  maximizes  their  expected  score, 
Lord's  (1974)  treatment  of  omits  as  fractionally  correct  can  be 
justified  as  providing  the  expectation  of  a  conditional  term  in 
the  full  likelihood  for  omission  probabilities  and  correct- 
response  probabilities.  This  procedure  is  readily  incorporated 
into  standard  complete-data  IRT  algorithms  and  avoids  having  to 
specify  the  full  likelihood,  but  sacrifices  information  about 
examinee  and  item  parameters  conveyed  by  the  observed  pattern  of 
missingness.  Given  the  complexity  of  models  for  the  full 
likelihood,  however,  this  expedient  seems  to  be  a  good  practical 
choice- -provided  that,  as  Lord  urges,  examinees  are  clearly 
informed  about  how  omits  will  be  scored  and  which  omitting 
strategy  maximizes  their  chances  of  scoring  well. 

Conclusion 

Although  collateral  information  about  examinees  and  items  is 
rarely  employed  in  item  response  theory  (IRT),  it  is  straight¬ 
forward  to  incorporate  it  using  Bayesian  and  empirical  Bayesian 
methods.  If  the  IRT  model  is  correct  and  examinees  are  assigned 
items  independently  of  values  on  collateral  variables,  then 
collateral  information  can  be  used  to  improve  item  parameter 
estimation  modestly.  Employing  collateral  information  is 


mandatory  to  obtain  correct  Bayesian  and  empirical  Bayesian 
inferences  if  it  was  used  to  assign  items  to  examinees. 

Aside  from  considerations  of  efficiency,  employing  collateral 
information  about  items  is  a  step  toward  integrating  educational 
and  psychological  theory  into  the  measurement  process .  Two 
aspects  of  this  idea  were  developed  in  the  course  of  the  project. 

The  first,  which  takes  a  more  traditional  measurement 
perspective,  assumes  that  a  single  IRT  model  provides  an 
acceptable  fit  to  the  data  of  interest.  Modeling  items'  operating 
characteristics  in  terms  of  salient  features  can  make  estimation 
more  precise,  but  more  importantly  it  elucidates  the  reasons  that 
items  are  hard  or  easy,  and  why  some  are  more  discriminating  than 
others.  A  formal  framework  is  thus  available  for  item 
construction  and  diagnosis,  expressing  relationships  among 
substantive  theory,  item  features,  and  measurement  properties. 

The  second  is  a  response  to  a  growing  awareness  of  the  fact 
that  traditional  psychometric  models  (IRT  as  well  as  classical 
test  theory)  measure  what  is  essentially  an  overall  level  of 
prof iciency- - losing  in  the  process  qualitative  differences  among 
examinees  that  arise  from  different  cognitive  solution  strategies. 
In  order  to  extend  psychometric  analysis  to  these  problems,  and  to 
bring  to  bear  the  findings  of  recent  research  upon  applied 
measurement  problems,  it  is  mandatory  to  employ  collateral 
information  about  examinees  and  items  that  bears  upon  the  ways 
that  people  solve  problems.  A  mixture  of  IRT  models  that  applies 
to  some  problems  of  this  type  was  introduced  in  the  project. 
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Table  2 


Ignorabllity  Results  for  Estimating  6  Given  /3 


Type  of 
Missingness 

Type  of  Inference 

Direct  Likelihood 

Bayesian  Sampling  Distribution 

Alternate 

Yes 

Yes 

Yes 

Forms 

Targeted 

Forms 

Yes 

Yes,  given 
examinee  variables 

Yes 

Adaptive 

Testing 

Yes 

Yes,  given 
examinee  variables 
if  they  are  used 

No 

Not -Reached 

Yes 

No,  unless  speed  and 
ability  are  independent 

Yes 

Intentional 

Omissions 

No 

No 

No 

Conditional 


on  the  observed  pattern  of  missingness . 


Ignorability  Results  for  Estimating  /S  After  Marginalizing  over  6 


Type  of 
Missingness 

Type  of  Inference 

Direct  Likelihood 

Bayesian  Sampling  Distribution 

Alternate 

Forms 

Yes 

Yes 

Yes 

Targeted 

Forms 

Yes,  given 
examinee  variables 

Yes,  given 
examinee  and  item 
variables 

Yes ,  given 
examinee  variables 

Adaptive 

Testing 

Yes,  given 
examinee  variables 
if  they  are  used 

Yes,  given 
item  variables  and 
examinee  variables 
if  they  are  used 

No 

Not-Reached 

No,  unless  speed 
and  ability  are 
independent 

No,  unless  speed 
and  ability  are 
independent 

No,  unless  speed 
and  ability  are 
independent 

Intentional 

Omissions 

No 

No 

No 

Conditional  on  the  observed  pattern  of  missingness. 
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